Toward Automatic Compilation of Phrasal Thesaurus
نویسندگان
چکیده
Thesaurus, which links between linguistic expressions (or concepts) based on various semantic relations, is one of the most fundamental semantic resources in a broad range of NLP tasks. A lot of work has been carried out relying on thesauri, such asWordNet (Miller, 1995) and automatically created versions of it. The entries of most existing thesauri are either single words or word sequences including phrasal verbs and canned phrases. However, they may not be almighty in dealing with meaning, because meaning of polysemous word is determined only when it is used in some context, and meaning of phrase, clause, and sentence is determined based not only on those of its constituent words but also on its construction. For a more precise semantic computing, we have proposed phrasal thesaurus, which regards phrases as entries (Fujita et al., 2007). While the term “phrase” generally refers to word sequences, such as phrasal verbs and canned phrases, in our study, the notion also includes predicate phrases those involve complements. Among various types of semantic relations between phrases, we have been addressing mainly paraphrases and textual entailment. This paper describes the direction and current status of our study on compiling phrasal thesaurus.
منابع مشابه
A Compositional Approach toward Dynamic Phrasal Thesaurus
To enhance the technology for computing semantic equivalence, we introduce the notion of phrasal thesaurus which is a natural extension of conventional word-based thesaurus. Among a variety of phrases that conveys the same meaning, i.e., paraphrases, we focus on syntactic variants that are compositionally explainable using a small number of atomic knowledge, and develop a system which dynamical...
متن کاملA Conceptual Framework For Automatic And Dynamic Thesaurus Updating In Information Retrieval Systems
This paper aims at presenting a methodology for automatic thesaurus construction in order to help the search of documents and we want to obtain the development of classes for specific topics (for a given corpus) without a priori semantic information. Information contained in the thesaurus lead to new search formulations via automatic and/or user feedback. This presentation even being theoretica...
متن کاملApplication of automatic thesaurus extraction for computer generation of vocabulary questions
Automatic thesaurus extraction techniques are applied to computer-generated related word vocabulary questions. These questions assess and provide practice for an aspect of word knowledge found to be important for language learning. Automatic generation of such questions reduces the need for human authoring of practice materials. In evaluations with real teachers, most of the generated questions...
متن کاملToward conceptual indexing using automatic assignment of descriptors
Indexing techniques have reached a well maturated state. Digital libraries and other digital collections make an intense use of these algorithms to store and retrieve documents. In the other side, we have browsing techniques, which lets the user to gather the information. Current approaches are not yet advanced enough in order to satisfy the user. At CERN we are working in a indexer based on th...
متن کاملExtending a Thesaurus with Words from Pan-Chinese Sources
In this paper, we work on extending a Chinese thesaurus with words distinctly used in various Chinese communities. The acquisition and classification of such region-specific lexical items is an important step toward the larger goal of constructing a Pan-Chinese lexical resource. In particular, we extend a previous study in three respects: (1) to improve automatic classification by removing dupl...
متن کامل